Improving estimates of pertussis burden in Ontario, Canada 2010–2017 by combining validation and capture-recapture methodologies

An underestimation of pertussis burden has impeded understanding of transmission and disallows effective policy and prevention to be prioritized and enacted. Capture-recapture analyses can improve burden estimates; however, uncertainty remains around incorporating health administrative data due to accuracy limitations. The aim of this study is to explore the impact of pertussis case definitions and data accuracy on capture-recapture estimates. We used a dataset from March 7, 2010 to December 31, 2017 comprised of pertussis case report, laboratory, and health administrative data. We compared Chao capture-recapture abundance estimates using prevalence, incidence, and adjusted false positive case definitions. The latter was developed by removing the proportion of false positive physician billing code-only case episodes after validation. We calculated sensitivity by dividing the number of observed cases by abundance. Abundance estimates demonstrated that a high proportion of cases were missed by all sources. Under the primary analysis, the highest sensitivity of 78.5% (95% CI 76.2–80.9%) for those less than one year of age was obtained using all sources after adjusting for false positives, which dropped to 43.1% (95% CI 42.4–43.8%) for those one year of age or older. Most code-only episodes were false positives (91.0%), leading to considerably lower abundance estimates and improvements in laboratory testing and case report sensitivity using this definition. Accuracy limitations can be accounted for in capture-recapture analyses using different case definitions and adjustment. The latter enhanced the validity of estimates, furthering the utility of capture-recapture methods to epidemiological research. Findings demonstrated that all sources consistently fail to detect pertussis cases. This is differential by age, suggesting ascertainment and testing bias. Results demonstrate the value of incorporating real time health administrative data into public health surveillance if accuracy limitations can be addressed.


Introduction
Pertussis remains one of the most common vaccine-preventable diseases in Canada [1].In 2019, the last year before the Covid-19 pandemic, the reported incidence of pertussis in Canada was 5.64 per 100,000 [1].Despite being a reportable disease, an underestimation of cases and deaths has impeded understanding of transmission.Canada relies on a passive surveillance system for pertussis, with local public health units responsible for reporting cases to provincial health authorities that were in turn reported from laboratories, health practitioners, hospital administrators, schools, and other institutions [2].As a result, ascertainment bias is a key concern.This occurs when atypical cases are underdiagnosed, including older individuals experiencing mild disease [3,4].This issue is worsened by testing bias, with younger, severe cases more likely to be tested, have a positive result, and be reported [5,6].It is suspected that milder, undetected cases may be sustaining pertussis transmission in Canadian communities [4].
Burden estimates vary regionally based on interactions between case definitions, the type of surveillance and data available, practitioner knowledge, immunization programs, and the extent of local transmission [5,7].Underestimation may be attributed to failing to consider pertussis diagnostically, atypical presentations, infrequent diagnostic testing, suboptimal test accuracy, lack of uniformity in case definitions, and reporting issues [3,6].In combination with complicated epidemiological characteristics, pertussis surveillance is consequently challenging [5,8].However, monitoring burden is essential for informing and assessing the impact of immunization programs and policy [5,[7][8][9][10].
To improve pertussis burden estimates, one strategy is to supplement surveillance data with health administrative data [11].When several sources are available, capture-recapture analyses can be used to better estimate burden [12].This analytic approach has been recently used to assess completeness of contact-tracing for Ebola and detection of Covid-19 infections [13,14].For pertussis, capture-recapture has been used to estimate the number of deaths in England and the number of cases in Ontario, Canada [11,15].The latter study estimated that 21-73% of cases have been missed using combined surveillance, laboratory, and health administrative data [11].However, considerable uncertainty in estimation remained around the validity of using health administrative data, and particularly Ontario Health Insurance Plan (OHIP) physician billing diagnostic codes.The aim of this study is to evaluate the impact of using different pertussis case definitions on capture-recapture estimates including abundance and data source sensitivity, with the goal of enhancing the utility of this method for improving burden estimates to inform public health surveillance, prevention, and policy.

Methods
The University of Toronto's Health Sciences Research Ethics Board (37885) and the Public Health Ontario (PHO) Ethics Review Board (2019-006.02)approved this study.The need for consent was waived by the ethics committees.Data were linked and analyzed at ICES (formerly the Institute of Clinical Evaluative Sciences) using unique encoded identifiers.The dataset from this study is held securely in coded form at ICES.While legal data sharing agreements between ICES and data providers (e.g., healthcare organizations and government) prohibit ICES from making the dataset publicly available, access may be granted to those who meet pre-specified criteria for confidential access, available at https://www.ices.on.ca/DAS (email: das@ices.on.ca).

Data sources
We obtained Public Health Information System (iPHIS) and PHO Laboratory Information System (Labware) data from a PHO linked dataset previously used to improve estimates of pertussis burden in Ontario [11].iPHIS data contained confirmed, probable, and "does not meet" (DNM) pertussis case reports.We considered reports greater than 365 days apart a new case, with data available from April 1, 2006 to March 31, 2015.When duplicates occurred in the same year, we gave priority to the highest level of confirmation.Labware data included positive, indeterminate, and negative pertussis laboratory tests, with cases defined as at least one positive result by PCR or culture.We counted positive results more than 90 days apart as a new case, and data were available from December 7, 2009 to March 31, 2015.Time intervals for data extraction were selected due to the availability of each data source, with the study period limited by Labware data collection beginning in 2009.iPHIS entries prior to December 1, 2009 were excluded accordingly (n = 3464).
We updated the PHO dataset until March 31, 2018 and combined it with health administrative data from December 1, 2009 from three databases held at ICES: the Canadian Institute for Health Information (CIHI) Discharge Abstract Database (DAD); the CIHI National Ambulatory Care Reporting System (NACRS); and the Ontario Health Insurance Plan (OHIP) Claims Database (Table 1).Labware and iPHIS entries that were unable to link to a unique ICES key number (IKN) after using both deterministic and probabilistic linkage methods were excluded (n = 1020).Collected health administrative data included ICD-10 codes A37.0 (whooping cough, Bordetella pertussis) and A37.9 (whooping cough, unspecified species) from hospitalizations and emergency room visits and OHIP diagnostic billing code 033 (whooping cough, Bordetella pertussis).We restricted OHIP claims to billings from homes, offices, and long-term care facilities.The Registered Persons Database (RPDB) was used to obtain data on patient sex, age, and date of death.We excluded health administrative data entries with same-day immunizations (n = 8305) as they are unlikely to reflect a true pertussis case.To do so, we used identified pertussis data entries with pertussis-containing immunization codes G840, G841, and G847 and general immunization codes G538 and G539 documented on the same day.We also excluded same-day duplicate entries for individuals (n = 260), as only one entry a day at maximum would reflect a true pertussis case.We excluded all entries with no index date or a date of death prior to the index date, as well as participants with an invalid unique identifier or who were missing their sex or birth date (n = 471).

Case definitions and sensitivity analyses
We developed three case definitions to separately input into capture-recapture models-period prevalence, incidence with exclusions, and false-positive adjusted (Table 2).Period prevalence permitted an individual to have one entry per data source over the study period.We did not apply a time limit to recapture in other sources, leading to the best recapture scenario.For incidence with exclusions, we incorporated data entries into episodes using 90-day rules and ruled out administrative data-only episodes with a negative pertussis laboratory test within 28 days of the episode start (Table 2 and S1 Fig).This structure required recapture in other sources to occur within the 90 days before or after the respective episode.Finally, for the last definition we eliminated the proportion of false positive OHIP code-only episodes from the incidence with exclusions case definition.To do so, we validated these episodes to obtain a positive predictive value (PPV) and took a random sample based on the estimated proportion of true positives.We conducted validation using a previously developed methodology and cohort from the Electronic Medical Record Primary Care (EMRPC) database (S1 Appendix) [16,17].The PPV was estimated as 8.99%, leading to the removal of 41,062 OHIP-code only entries for the primary analysis.
To ensure that failure to recapture was not due to data missingness, we applied a study window based on data availability.We removed cases with a first date before March 7, 2010 and last date after December 31, 2017 to give a buffer of 90 days at each end to ensure an accurate episode start and end date, which we defined as the earliest and latest date for an episode in any data source.For the primary analysis, we included iPHIS confirmed cases, Labware positive PCR or culture results, and administrative data.However, we applied three sensitivity analyses to the three data structures described above to explore further uncertainty in case definitions.For the first sensitivity analysis, we incorporated iPHIS probable cases (n = 537).iPHIS probable and DNM cases were included in the second sensitivity analysis in addition to indeterminate laboratory tests (n = 1957).Episodes were also ruled out if the episode only included administrative data or DNM entries and there was a negative pertussis laboratory test within 28 days of the episode start, reflecting a possible false positive entry.For the final sensitivity analysis, we excluded entries with A37.9 ICD-10 codes (n = 5749), meaning the pertussis species was unspecified.For each sensitivity analysis, the new datasets with entries included or excluded were input into the different case definition structures prior to capture-recapture analysis.

Capture-recapture analyses
Capture-recapture estimates abundance using the cases identified in each source and their overlap to calculate the number missed by all [18].We assumed the population was closed and that temporality was present [11,19].We combined all administrative data into a single source at the outset, assuming and thereby accounting for dependency [12].We assessed other dependencies by calculating the probability of being captured in one source given being in another [11].Additionally, we evaluated random detection using Pearson's chi-squared tests of the observed versus expected number of cases in a pair of sources [11].Both were assessed under the prevalence structure to ensure the assumption of independence for these tests was not violated.We used these results to select a dependency structure in combination with theoretical considerations [20].We evaluated heterogeneity by assessing the linearity of heterogeneity graphs [19].We used models that accounted for a closed population, temporality, and heterogeneity if determined to be present.The latter was expected as milder, older cases of pertussis are less likely to be tested, have a positive result, and be reported to surveillance [5,6].We explicitly built the selected dependency structure into models by including two-way interaction coefficients for sources hypothesized to be dependent [12].We chose Chao's lower bound estimator for total sample size for the capture-recapture models, which additionally accounts for dependency [12].We compared model estimates to those from Darroch models, which further correct for heterogeneity [21].We rounded estimates of abundance to the nearest whole number.We considered results statistically significant at alpha � 0.05 and we used the Rcapture package in R [19,22].

Estimated sensitivity
We calculated sensitivity by dividing the number of cases identified by each data source by the estimated abundance [11].For the incidence and adjusted false positive case definitions, there was concern that correlation (clustering) would arise from having multiple cases for some individuals, impacting the sensitivity point estimate and variance [23,24].To address this, sensitivity was additionally calculated under both definitions by selecting a random episode per person to remove the effect of clustering.We then compared these results to those including multiple episodes, with little difference used as evidence that estimates were robust.

Capture-recapture results
Under the prevalence case definition, all sources were dependent based on pair-wise probabilities and Pearson's chi-squared test was highly significant (p < 0.00001), suggesting non-random detection.This provided support for the theorized dependency structure, and we used all two-way interaction terms to account for dependency between each source pair.To ensure comparability between case definitions, we used this structure for all capture-recapture models.A visual depiction of dependency is presented in Fig 1 using the degree of overlap between data sources under prevalence.Heterogeneity graphs lacked linearity, indicating heterogeneity was present and had to be accounted for in modelling (S2 Fig) .Estimated abundance was similar for those less than one year of age under the definitions for period prevalence and incidence with exclusions, at 3810 (95% CI 2932-5707) and 3078 (95% CI 2362-4609) respectively (Table 3).Abundance was considerably lower with less variability, as measured by the width of the 95% CIs, for the adjusted false positive definition at 1151 (95% CI 964-1538).Overall, the one year or older age group had more variable estimates.Abundance for this age group was again similar under prevalence and incidence, at 114,135 (95% CI 87,228-155,391) and 132,528 (95% CI 100,384-181,775).The false positive definition had substantially lower estimates and variability at 20,490 (95% CI 15,998-27,319).Darroch models produced identical abundance estimates in scenarios with more than one two-way interaction term.

Sensitivity analyses
After the addition of probable iPHIS cases, there was little change to estimated abundance (Table 3).Including iPHIS probable and DNM cases and indeterminate laboratory tests produced the largest increase in abundance and introduced a considerable amount of variability.While this pattern occurred under all case definitions, the highest estimate of 252,872 (95% CI 209,058-308,891) was obtained under incidence for those one year of age or older.After removing A37.9 codes, abundance estimates increased and decreased without a reliable pattern.

Estimated sensitivity
Sensitivity estimates are available in Table 4. Results were comparable and generally within a few percentage points with and without multiple episodes per individual (S1 Table ).As a result, clustering was determined to have a minimal effect and we reported sensitivity estimates that included multiple episodes per individual.Using all data sources consistently provided the highest sensitivity (Table 4).Labware had the lowest sensitivity while administrative data had the highest for a single source.Under the primary analysis and prevalence definition, the highest sensitivity for those less than one year of age was 69.2% (95% CI 67.7-70.7%),which dropped to 40.6% (95% CI 40.3-40.8%)for the older age group.Incidence sensitivity estimates were similar but slightly lower in comparison, excepting marginally higher estimates for iPHIS and Labware for the younger age group.Overall, using the adjusted false positive definition increased sensitivity, with all sources under the primary analysis producing a sensitivity of 78.5% (95% CI 76.2-80.9%)and 43.1% (95% CI 42.4-43.8%)for the younger and older age groups respectively.However, the largest increase to sensitivity occurred for iPHIS and Labware estimates, although sensitivity for these sources remained low for the older age group.
Any change to sensitivity estimates after the addition of probable iPHIS cases was small, within 5%.After including iPHIS probable and DNM cases and indeterminate laboratory

Discussion
Abundance estimates demonstrated that a high proportion of pertussis cases were missed by all sources.Results were similar when using prevalence and incidence, but after adjusting for physician billing code-only false positives abundance dropped considerably.While this occurred for both age groups, the effect was greater among those one year of age or older.The low estimated PPV of physician billing code-only episodes provides evidence that false positives have been inflating observed counts and abundance estimates from capture-recapture analyses of pertussis in Ontario, which decreased sensitivity for laboratory and case report data.As a result, the adjusted case definition is the most valid out of those tested, with the described approach useful for improving the utility of capture-recapture methods to epidemiological surveillance.This procedure details how to incorporate sources of health administrative data to improve passive pertussis surveillance estimates, while accounting for validity issues inherent in such secondary data.In the future, the estimated PPV may be used for adjustment prior to conducting capture-recapture analyses.Alternatively, the validation strategy may be used to calculate unique validity estimates prior to adjustment, with extensions to other diseases and jurisdictions.Regardless of the case definition, health administrative data had the highest sensitivity for a single source, with all sources combined producing the best sensitivity.This establishes the value of incorporating health administrative data into pertussis surveillance if accuracy and timeliness limitations are addressed.Laboratory tests had the lowest sensitivity and particularly for the older age group, indicating testing bias is present.Public health case reports had slightly higher sensitivity but displayed a similar pattern.While sensitivity estimates improved for both after adjusting for false positives, this was primarily in the younger age group.Overall, sensitivity was substantially lower for the older group, suggesting ascertainment bias is present.
A 2018 Ontario capture-recapture study used the same data sources but over fewer years.As a result, abundance is not directly comparable as the higher estimates from this study are expected due to the extra years of data.However, sensitivity for all sources with probable case reports included was estimated at 54% and 39% for the younger and older age groups respectively.This is comparable to sensitivity estimates for the older age group in this study after including iPHIS probable cases, but sensitivity was higher for the younger group (~66%).This could be due to improved recapture in this subgroup using the case definitions or differences in how the 2018 study modelled dependencies [11].The 2018 study reported considerable uncertainty persisting around physician billing code accuracy and investigated by using different proportions of true positives for all health administrative data, with the lowest at 25%.This dropped abundance estimates by 66% and 73% for the younger and older age groups [11].We were able to address remaining uncertainty through validation of physician billing code-only episodes.Interestingly, applying the estimated PPV of 8.99% to these episodes reduced abundance estimates similarly to assuming a PPV of 25% for all health administrative data in the 2018 study, by 64% and 84%.The higher latter value indicates a greater proportion of code only-episodes in the older age group, leading to enhanced improvement in recapture once removed.Laboratory test and case report sensitivity considerably improved using this case definition.
A modelling study based on pertussis incidence in southern Ontario from 1993-2004 estimated that five to 33,032 cases remain undetected per reported case depending on age [4].It has been stated elsewhere that the true number of pertussis cases is at least three times higher than what is reported [3].To compare, estimates from this study should be calculated using the false positive case definition to avoid inflating underdetection.For case report data, these values are 2.5 and 8.9 for the younger and older age groups under the primary analysis and 4.7 and 12 after including probable and DNM case reports and indeterminate laboratory tests.Using all data sources, 1.3 and 2.3 cases were missed per observed case for the younger and older age groups, which increased to 2.4 and 3.8 with the additional case reports and laboratory tests.The substantially lower upper limit compared to the modelling study is likely due to differences in age groups, pertussis incidence during the respective time periods, and diagnostic testing accuracy, with PCR introduced after 2004.In addition, the estimation approaches differ, with the modelling study using methods with considerable uncertainty as reflected by the wide range of values.While this study used the more conservative Chao's lower bound estimator, similar results were obtained from Darroch models.Furthermore, a simulation study found that Chao's methods estimated abundance within 75-82% of the total population size in most complicated scenarios [12].Regardless, this study's findings are in line with past estimates of underdetection, with reasonable explanations for remaining differences.
The estimated PPV for physician billing code-only episodes of 8.99% (95% CI 1.59-16.39%) is comparable to the PPV of 13.6% (95% CI 9.28-17.9%)obtained for an OHIP physician billing code algorithm within the EMRPC using the same cohort [17].The slightly higher PPV in the EMRPC can be explained by using prevalent cases, increasing the likelihood of concordance between codes and cases.Additionally, EMRPC billing codes were only collected from physician offices, potentially decreasing the number of false positives.OHIP pertussis cases billed at homes or long-term care facilities are unlikely to be documented in EMRPC's primary care patient records.Further contributing to this issue is that visits outside office settings such as walk-in or specialist visits fail to be captured in the EMRPC, leading to about 15% of interactions being missed [25].In addition, only two thirds of the laboratory tests in OHIP are documented in the EMRPC [25].Missing any of these data in the EMRPC could artificially decrease the PPV of OHIP billing code-only episodes by increasing the number of false positives, although billings from outside office settings were uncommon in the OHIP database and unlikely to greatly affect validation.The EMRPC study only tested the accuracy of data available in the EMRPC, meaning sensitivity estimates for emergency room visits, hospitalizations, and case report data are not available.However, pertussis laboratory test sensitivity using prevalent cases was reported as 0.64% (95% CI 0.37-1.09%)across all ages [17].After adjusting for false positives, 8.0% sensitivity (95% CI 7.62-8.38%)was obtained for the older age group (which only excludes infants) using prevalence.This difference can be explained by variation in ages, pertussis classification, and validation methods.Additionally, Labware has more comprehensive coverage, with the EMRPC noted to have decreased laboratory test sensitivity through incomplete documentation [17].
One limitation of this study is that we did not validate physician billing code-only episodes separately for the younger and older age groups.This was to preserve an adequate sample size, with the assumption that the PPV averages out across age groups and this is an appropriate strategy for taking a random sample based on a proportion.In addition, while demonstrating that sensitivity estimates differ by age group, it is unlikely that PPV would vary to the same extent.PPV evaluates the proportion of true positives out of test positives and is primarily affected by prevalence, not testing bias [26].Although it may appear that younger individuals have a higher risk of pertussis infection, due to testing and ascertainment bias this may not actually be the case [5].Older cases are hypothesized to be an important source of pertussis, which is evident through older relatives being key sources of transmission to infants [5,27].Additionally, in 2019, 62% of reported cases in Ontario occurred in those ten years of age or older [28].However, it may be of interest to allow the PPV to vary separately by age group in future analyses.
An additional limitation is that we had to remove EMRPC cases without dates during validation.We considered it preferable to avoid introducing misclassification rather than preserving the sample size, and there is no reason to suspect excluded cases are systematically different from the majority of those included.A further limitation of validation is that we were unable to adjust the PPV and sensitivity estimates for clustering under the incidence and false positive definitions due to the low sample size and study methodology respectively [23,24].To address this, we compared point estimates and variances to those using a single episode per individual to assess the effect of correlation, with little difference found between PPV estimates.While some sensitivity estimates were statistically significantly different, the absolute difference was small, and this was for the older age group where large sample sizes produced substantial precision.As a result, we concluded that these differences were unlikely to be clinically significant.While it is possible correlation still minorly affected the findings, it is a study strength to be able to report sensitivity estimates under different case definitions.Little validation research has incorporated repeated episodes, which is of interest for acute diseases.
Finally, missing data may impact abundance and sensitivity estimates, but this is an inherent limitation of using secondary data and beyond the scope of this study.This includes failing to collect data on certain cases, such as those that are milder in nature who do not seek health care or may be misdiagnosed.While pertussis infectivity is related to severity, meaning that milder undetected cases are likely less infectious, it is possible that many are still important to transmission.While Labware data only covers pertussis laboratory testing for < 95% of Ontario, we assumed that the remaining tests would not considerably impact abundance estimates [11,29].To assess the effect of missing data, we conducted sensitivity analyses incorporating additional data to determine the robustness of capture-recapture abundance estimates.This produced little change to results, except when including probable and DNM cases in addition to indeterminate laboratory tests.Doing so decreased sensitivity and led to greater similarity in capture patterns between the younger and older age groups.This could suggest that older individuals are less likely to meet the confirmed case definition, or that milder infections are frequently missed in younger as well as older individuals.Alternatively, it could be due to increased uncertainty in abundance among both groups under this analysis, stemming from greater uncertainty in true pertussis status.

Conclusions
This study demonstrated how limited health data accuracy can be accounted for using capture-recapture analyses that employ different pertussis case definitions.The false-positive adjusted case definition helped address past uncertainty in burden estimation and produced results which align with the degree of underdetection reported in the literature.The described methodology allows passive pertussis case report data to be supplemented with health administrative data to better estimate burden, while accounting for validity issues that accompany secondary data use.Improved capture-recapture estimates can better inform public health policy and prevention.Findings consistently demonstrated that data sources are failing to detect pertussis cases, and particularly laboratory and case report data.The best sensitivity was obtained by using all sources together, with health administrative data having the highest sensitivity for a single source.This indicates the benefit of incorporating real time health administrative data into surveillance if misclassification can be addressed.The results provide further support that pertussis detection differs by age, indicating that ascertainment and testing bias is present in data.

Table 2 . Descriptions of case definitions and number of observed pertussis cases by analysis.
Individuals were only permitted to have one entry per data source over the study period.Recapture could occur at any time over the study period.Entries were incorporated into episodes using 90-day rules, after which administrative data-only episodes were ruled out in cases with a negative pertussis laboratory test within 28 days of the episode start.Recapture had to occur within 90 days before or after the respective episode.